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Abstract 


One of the Grand Challenges of the Federal High Performance Computing 
and Communications (HPCC) Program is in remote exploration and exper- 
imentation (REE). The goal of the REE Project is to develop a space-borne 
computing technology base that will enable the next generation of missions 
to explore the Earth and the Solar System. This paper discusses an ongoing 
study that uses a recent development in communication control technology 
to implement hybrid hypercube structures. These architectures are similar to 
binary hypercubes, but they also provide added connectivity between the pro- 
cessors. This added connectivity increases communication reliability while 
decreasing the latency of interprocessor message passing. Because these fac- 
tors directly determine the speed that can be obtained by multiprocessor sys- 
tems, these architectures are attractive for applications such as REE, where 
high performance and ultrareliability are required. This paper describes and 
enumerates these architectures and discusses how they can be implemented 
with a modified version of the hyperswitch communication network (HCN). 
The HCN is analyzed because it has three attractive features that enable these 
architectures to be effective: speed, fault tolerance , and the ability to pass 
multiple messages simultaneously through the same hypersuhtch controller. 


1. Introduction 

One of the Grand Challenges of the Federal 
High Performance Computing and Communications 
(HPCC) Program is in the area of remote exploration 
and experimentation (REE). The goal of the REE 
Project is to develop a space-borne computing tech- 
nology base that will enable high-performance, fault- 
tolerant, adaptive space systems for a new genera- 
tion of missions to explore the Earth and the Solar 
System. The specific objectives of the REE Project 
are to demonstrate that a thousandfold increase in 
performance is feasible and to identify a parallel, 
scalable architecture that can incorporate new r tech- 
nologies to meet a broad range of requirements. As 
described in The Remote Exploration and Experi- 
mentation Project Plan by the Jet Propulsion Labo- 
ratory, the architecture must also provide affordable 
fault tolerance and long-term reliability in an envi- 
ronment of limited power and weight, high radiation, 
and no maintainability. To meet these objectives, 
new architectures must be investigated with consid- 
eration given to REE-type applications. 

This paper discusses an ongoing study that at- 
tempts to use a recent development in hypercube 
communications control technology, the hyperswitch 
communication network (HCN) chip set (ref. 1), to 
implement a variety of generalized and hybrid hyper- 
cube architectures. These architectures are similar to 
binary hypercubes; but they also provide added con- 
nectivity between the processors. This added con- 
nectivity increases communication reliability while 
decreasing the latency incurred when passing mes- 


sages between processors. Because these factors di- 
rectly determine the speed that can be obtained with 
multiprocessor systems, these architectures arc at- 
tractive for applications such as REE. where high 
performance and ultrareliabilitv are required. 

This paper describes and enumerates these archi- 
tectures and discusses how they can be implemented 
with a modified version of the HCN chip set devel- 
oped at the Jet Propulsion Laboratory. The HCN 
chip set is analyzed here because it has three attrac- 
tive features that enable these architectures to be 
effective: speed, fault tolerance, and ability to pass 
multiple messages simultaneously through the same 
hyperswitch controller. 

This paper is organized as follows. Section 2 de- 
scribes generalized interconnection networks: both 
their organization and their relation to binary hyper- 
cube implementations. Expressions are given for 
the number of links, the number of disjoint paths 
between nodes, and other characteristic indices. 
Section 3 describes the hyperswitch communication 
network chip set: both its capabilities and its lim- 
itations. Section 4 describes and enumerates the 
possible generalized hypercubes that become feasible 
when hyperswitch technology is used in the network 
input/output (I/O) elements. Section 5 describes 
how the HCN chips can be modified to implement 
these architectures. Section 6 presents the benefits 
of these networks when used for mult iple instruction 
multiple data (MIMD) architectures and how these 
networks can be used to increase system performance 
and reliability. 



features: the ability to pass multiple messages simul- 
taneously through the same hyperswitch (up to 11), 
the ability to reroute around busy channels and most 
importantly, the ability to reroute these messages 
quickly (less than 200 /isec for 512 byte messages). 

The hyperswitch chip set (HSP) (fig. 5) consists 
of a custom hyperswitch (crossbar) element (HS), a 
hypcrswitch I/O element (HSIO), and a message dis- 
patch processor element (DP) (ref. 5). The HSP in- 
terfaces with other HSP’s through 11 bidirectional 
channels (ChO to ChlO). These chips were de- 
signed specifically to provide fast dynamic circuit- 
and packet-switching capabilities in binary hyper- 
cube architectures. 



ChO Chi Ch2 ChlO 


Figure 5. Hvperswitch processor. 


In circuit-switching mode, the HSP establishes 
a path from source to destination before message 
transmission. This path is established by emitting a 
circuit probe (1 to 4 bytes) from the source node. The 
probe contains the destination node address, message 
length information, distance information, and some 
history information in case backtracking is required 
to establish the virtual link. The probe is then 
sent through intermediate nodes to the destination 
and the virtual link is established. At this time, 
the message itself can be transmitted across the 
virtual link at a rate equal to the link bandwidth. 
For circuit-switching mode, the message transmission 
latency T ckt is 


^ckt — (^probe^^link) + (SmsgB\ ink ) (4) 

where S pro b e is the size of the probe, H is the number 
of hops in the virtual link, B knk is the bandwidth of 
the links, and 5 msg is the size of the message. 


In packet-switching mode, the HSP passes an en- 
tire message as a packet or set of packets, just as 
it passes a probe in circuit-switching mode. For 
packet-switching mode, the message transmission la- 
tency T pkt is 


^pkt = S pkt NHB ]mk (5) 

where 5 pkt is the size of each packet, and N is 
the number of packets required to send the entire 
message. 

In busy networks, both equations (4) and (5) must 
be appended to include the effects of encountering 
busy or failed links when establishing a path from 
source to destination. When a busy or failed link is 
encountered, one of three options is available: buffer 
the message until the link becomes available, drop the 
transaction and try again at a later time, or detour 
around the link. Each of these options increases the 
overall message latency. 

Each HSP has 11 hyperswitch elements that act as 
the I/O ports for each node in the hypercube. There- 
fore, for binary hypercubes, the maximum number of 
nodes is 2 11 (2048) because only one port is needed 
for each dimension. For nonbinary (e.g., generalized) 
hypercubes, a slightly different interpretation is dis- 
cussed in section 4. For each hyperswitch, an HSIO 
performs the parallel-to-serial -serial-to-parallel con- 
version of the 8-bit data that travel between the hy- 
perswitch and serial links that connect to neighboring 
HSP’s (up to 11 serial links connect every node). 

The DP is a Motorola MC88000 32-bit reduced 
instruction set computer (RISC), which can provide 
17 million instructions per second. The DP performs 
transfers to and from system memory and acts as 
the interface between the HSP and the application 
processor. This processor also controls all crossbar 
settings in the hyperswitches of the HSP when es- 
tablishing paths from source to destination during 
message transmission. The DP can act as the appli- 
cation processor as well. 

Message routing latency is reduced with an adap- 
tive backtracking algorithm implemented in the DP. 
This algorithm automatically avoids congested links 
based on its current knowledge of congestion in the 
network. When a message encounters a busy link, 
it does not wait for the link to become idle; instead, 
it tries to reach the destination by backtracking to 
the previous intermediate node and departing from 
another port. Virtual links between nodes are es- 
tablished by the switching elements in the HSP’s of 
each node. This dynamic routing method has been 
shown to significantly reduce message routing over- 
head as well as increase the communication reliability 
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because of the ability to backtrack and avoid busy or 
faulty network links (ref. 4). 

4. Generalized Structures and the HCN 

Using an HSP as the I/O controller at each node 
of a generalized hypercube architecture allows a wide 
variety of configurations to be implemented. As dis- 
cussed previously, each HSP has 11 I/O ports that 
can be used to interconnect a number of processing 
sites. The chip set specification denotes that one of 
these ports should be used for diagnostic purposes; 
that is, it should be connected to itself and periodi- 
cally have test data run through the port. The other 
10 ports are then free to be interconnected to the 
HSPs of other nodes in the system. 

Therefore, we can now calculate the number of 
possible generalized hypercube architectures that can 
be constructed with a maximum of 10 ports per node. 
This number equals the number of unique integer 
partitions of 10 as well as any integer less than 10. 
An integer partition of an integer r is the division of 
r into a number of integers whose sum is r. Thus, 
the list of generalized hypercubes that can be im- 
plemented with the hyperswitch can be represented 
by any set of integers whose sum is less than or 
equal to 10. For example, the partition {2,2,3, 3} is 
an integer partition of 10. The corresponding four- 
dimensional generalized hypercube is a (3, 3, 4, 4) con- 
figuration consisting of 144 nodes. The integers in 
the partition correspond to the number of ports re- 
quired in each dimension. 

From reference 6, the number of unique integer 
partitions of a number r is obtained from the coeffi- 
cient of x r in the following generating function: 

^ oc oo 

G(*)= ri5>* m (6) 

m— 1 k=() 

Specifically, for r < 10, 

G(x) = (1 -}- x + x 2 + ... + x 8 + x 9 + x 10 ) 
x (l + x 2 +x 4 + a: 6 + x 8 + x 10 ) 

X (1 + x 3 + x 6 + x 9 )(l + x 4 + X 8 ) 

X (l + x 5 + x 10 )(l+x 6 )(l + x 7 ) 
x (l+x 8 )(l + x 9 )(l + x 10 ) (7) 

or 

G(x) = 1 4- x 4- 2x 2 4- 3x 3 4* 5a: 4 + 7a: 5 

4- 11a; 6 + 15a: 7 + 22a: 8 4- 30a: 9 4- 42x 10 (8) 


Where in equations (7) and (8), all terms with powers 
larger than 10 have been eliminated, because 10 is 
the maximum r we are interested in for this example. 
Furthermore, the generating function in equation (8) 
indicates the number of possible architectures with 
respect to the number of ports required per node 
(table 3). Finally, we can calculate the total number 
of generalized hypercube architectures possible by 
simply adding the coefficients of equation (8) as 
follows: 

1 + 1 4- 2 + 3 + 5 + 7 + 1 1 + 15 4- 22 4- 30 + 42 = 139 

Table 3. Possible Generalized Hypercubes 


Number of ports/node 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Number of architectures 

T 

1 

2 ! 

3; 

5 

7 

11 

15 

22 

30 

42 


These architectures are listed in the appendix 
(with the exception of the trivial architecture that 
has 0 ports per node) and grouped according to 
the number of dimensions. The one-dimensional ar- 
chitectures in the appendix represent the fully con- 
nected systems that can be implemented. In addi- 
tion to the list in the appendix, a large number of 
hypcrrectangular and hybrid hypercubcs can be con- 
structed. Again, the only constraint imposed is the 
number of I/O ports required per node. 

Architectures can now be chosen based on the 
characteristics of the application. For example, con- 
sider an application with three distinct distributed 
components: A, B , and C. Each component has in- 
creasing levels of communication bandwidth require- 
ments. Choose a three-dimensional architecture with 
the processors in dimension 1 connected in a ring, 
processors in dimension 2 connected in a mesh, and 
processors in dimension 3 fully connected. Finally, 
map component A onto the processors in dimen- 
sion 1, component B onto the processors in dimen- 
sion 2, and component C onto the processors in di- 
mension 3. Choosing the number of processors in 
each dimension now depends on the amount of paral- 
lelism inherent in the corresponding distributed com- 
ponents of the application. 

5. Modifying HSP Element 

To implement generalized hypercubes with the 
hyperswitch network element (fig. 5), two issues must 
be addressed. The first issue relates to the header 
information within the probes and message packets. 
The second issue requires changes in the coding of 
the DP as well as any hardwired functions pertain- 
ing to the architecture being configured (neighbor ad- 
dresses) and the routing algorithm used. 
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Appendix 

Generalized Hypercubes With the HCN 

Tables A1 to A 10 list the generalized hypercubes that can be implemented with a modified version of 
the hyperswitch communication network (HCN). Architectures are described by the generalized hypercube 
representation (which conveys the number of nodes in each dimension and the number of dimensions d), the 
number of I/O ports required for each node P, the number of bits required to represent the node addresses B g , 
and the total number of nodes in the topology N . 


Table Al. Ten-Dimensional Generalized Hypercubes Table A5. Six-Dimensional Generalized Hypercubes 



Table A3. Eight-Dimensional Generalized Hypercubes Table A6. Five-Dimensional Generalized Hypercubes 


Configuration P B g N 

2. 2. 2. 2.2. 2. 2. 2 8 8 256 

2. 2. 2. 2. 2. 2. 2. 3 9 9 384 

2. 2. 2. 2. 2. 2.3. 3 10 10 576 

2. 2. 2. 2. 2. 2.2. 4 10 10 512 


Table A4. Seven-Dimensional Generalized Hypercubes 


Configuration P B g N 

2XWXM 7 7 128 

2. 2.2. 2. 2. 2. 3 8 8 192 

2. 2. 2. 2. 2. 3. 3 9 9 288 

2. 2. 2. 2. 2. 2. 4 9 8 256 

2. 2. 2. 2. 3. 3. 3 10 10 432 

2. 2. 2. 2. 2. 3. 4 10 9 384 

2. 2. 2. 2. 2. 2. 5 10 9 320 


Configuration 

P 

B g 

N 

2, 2, 2, 2,2 

5 

5 

32 

2, 2, 2, 2,3 

6 

6 

48 

2, 2, 2, 3,3 

7 

7 

72 

2, 2, 2, 2, 4 

7 

6 

64 

2, 2, 3, 3, 3 

8 

8 

108 

2, 2, 2, 3, 4 

8 

7 

96 

2, 2, 2, 2, 5 

8 

7 

80 

2, 3, 3, 3, 3 

9 

9 

162 

2,2, 3, 3, 4 

9 

8 

144 

2, 2, 2, 4, 4 

9 

7 

128 

2, 2, 2, 3, 5 

9 

8 

120 

2, 2, 2, 2, 6 

9 

7 

96 

3, 3, 3, 3, 3 

10 

10 

243 

2, 3, 3, 3, 4 

10 

9 

216 

2, 2, 3, 4, 4 

10 

8 

192 

2, 2, 3, 3, 5 

10 

9 

180 

2, 2, 2, 4, 5 

10 

8 

160 

2, 2, 2, 3, 6 

10 

8 

144 

2, 2, 2, 2, 7 

10 

7 

112 
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